Naïve Terminological Annotation of Legal Texts in Slovak
نویسندگان
چکیده
Correct automatic terminological annotation of texts in a corpus can be sometimes challenging task, especially for moderately or heavily inflected languages with relatively free word order. We explore the possibility simple based on sequence matching lemmatized to annotate Slovak language IATE entries. The accuracy annotating legal is very good when multiword terms, while single-word terms increased by applying filters lengths and blacklisting most frequent false positives.
منابع مشابه
Detecting Commas in Slovak Legal Texts
This paper reports on initial experiments with automatic comma recovery in legal texts. In deciding whether to insert a comma or not, we propose to use the value of the probability of a bigram of two words without a comma and a trigram of the words with the comma. The probability is determined by the language model trained on sentences with commas labeled as separate words. In the training data...
متن کاملSemantic Annotation of Legal Texts through a FrameNet-Based Approach
In this work we illustrate a novel approach for solving an information extraction problem on legal texts. It is based on Natural Language Processing techniques and on the adoption of a formalization that allows coupling domain knowledge and syntactic information. The proposed approach is applied to extend an existing system to assist human annotators in handling normative modificatory provision...
متن کاملLearning from Texts -a Terminological Metareasoning Perspective Learning from Texts -a Terminological Metareasoning Perspective Learning from Texts -a Terminological Metareasoning Perspective
We introduce a methodology for concept learning from texts that relies upon second-order reasoning about statements expressed in a ((rst-order) terminological representation language. This metareasoning approach allows for quality-based evaluation and selection of alternative concept hypotheses. Abstract We introduce a methodology for concept learning from texts that relies upon second-order re...
متن کاملLearning from texts - a terminological metareasoning perspective
We introduce a methodology for concept learning from texts that relies upon second-order reasoning about statements expressed in a (first-order) terminological representation language. This metareasoning approach allows for quality-based evaluation and selection of alternative concept hypotheses. 1 I n t r o d u c t i o n In this paper, we consider the problem of concept learning from a new met...
متن کاملExtraction and analysis of proper nouns in Slovak texts
Unknown named entity recognition in inflected languages faces several specific problems – the first and foremost is that the entities themselves are inflected1 (Dvonč et al., 1966) leading to a problem of identifying word forms as belonging to the same lexeme, and also the problem of finding correct lemma. In this article we analyse the distribution of word forms for proper nouns in Slovak and ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Rasprave: ?asopis Instituta za Hrvatski Jezik i Jezikoslovlje
سال: 2022
ISSN: ['1331-6745', '1849-0379']
DOI: https://doi.org/10.31724/rihjj.48.1.2